Mining All Non- Derivable Frequent Itemsets 



Toon C alders* 
University of Antwerp, Belgium 

Bart Goethals 
University of Limburg, Belgium 



Abstract 

Recent studies on frequent itemset mining algorithms resulted in sig- 
nificant performance improvements. However, if the minimal support 
threshold is set too low, or the data is highly correlated, the number 
of frequent itemsets itself can be prohibitively large. To overcome this 
problem, recently several proposals have been made to construct a con- 
cise representation of the frequent itemsets, instead of mining all frequent 
itemsets. The main goal of this paper is to identify redundancies in the 
set of all frequent itemsets and to exploit these redundancies in order to 
reduce the result of a mining operation. We present deduction rules to 
derive tight bounds on the support of candidate itemsets. We show how 
the deduction rules allow for constructing a minimal representation for all 
frequent itemsets. We also present connections between our proposal and 
recent proposals for concise representations and we give the results of ex- 
periments on real-life datasets that show the effectiveness of the deduction 
rules. In fact, the experiments even show that in many cases, first mining 
the concise representation, and then creating the frequent itemsets from 
this representation outperforms existing frequent set mining algorithms. 

1 Introduction 

The frequent itemset mining problem Q is by now well known. We are given 
a set of items T and a database D of subsets of X, together with a unique 
identifier. The elements of T> are called transactions. An itemset I C X is 
some set of items; its support in 2?, denoted by support(I ,T>) 7 is defined as the 
number of transactions in T> that contain all items of /; and an itemset is called 
s- frequent in T> if its support in T> exceeds s. T> and s are omitted when they 
are clear from the context. The goal is now, given a minimal support threshold 
and a database, to find all frequent itemsets. 

The search space of this problem, all subsets of I, is clearly huge. Instead of 
generating and counting the supports of all these itemsets at once, several solu- 
tions have been proposed to perform a more directed search through all patterns. 
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However, this directed search enforces several scans through the database, which 
brings up another great cost, because these databases tend to be very large, and 
hence they do not fit into main memory. 

The standard Apriori algorithm pi for solving this problem is based on the 
monotonicity property: all supersets of an infrequent itemset must be infrequent. 
Hence, if an itemset is infrequent, then all of its supersets can be pruned from 
the search-space. An itemset is thus considered potentially frequent, also called 
a candidate itemset, only if all its subsets are already known to be frequent. 
In every step of the algorithm, all candidate itemsets are generated and their 
supports are then counted by performing a complete scan of the transaction 
database. This is repeated until no new candidate itemsets can be generated. 

Recent studies on frequent itemset mining algorithms resulted in significant 
performance improvements. In the early days, the size of the database and the 
generation of a reasonable amount of frequent itemsets were considered the most 
costly aspects of frequent itemset mining, and most energy went into minimizing 
the number of scans through the database. However, if the minimal support 
threshold is set too low, or the data is highly correlated, the number of frequent 
itemsets itself can be prohibitively large. To overcome this problem, recently 
several proposals have been made to construct a concise representation of the 
frequent itemsets, instead of mining all frequent itemsets |, |, § [l4[ [|[ 
0- 

Our contributions The main goal of this paper is to present several new 
methods to identify redundancies in the set of all frequent itemsets and to 
exploit these redundancies, resulting in a concise representation of all frequent 
itemsets and significant performance improvements of a mining operation. 

1. We present a complete set of deduction rules to derive tight intervals on 
the support of candidate itemsets. 

2. We show how the deduction rules can be used to construct a minimal 
representation of all frequent itemsets, consisting of all frequent itemsets 
of which the exact support can not be derived, and present an algorithm 
that efficiently does so. 

3. Also based on these deduction rules, we present an efficient method to 
find the exact support of all frequent itemsets, that are not in this concise 
representation, without scanning the database. 

4. We present connections between our proposal and recent proposals for 
concise representations, such as free sets [|J, disjunction-free sets [Q, and 
closed sets Jl3[ . We also show that known tricks to improve performance 
of frequent itemset mining algorithms, such as used in MAXMINER jl| 
and PASCAL ||, can be described in our framework. 

5. We present several experiments on real-life datasets that show the effec- 
tiveness of the deduction rules. 
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The outline of the paper is as follows. In Section 2 we introduce the deduction 
rules. Section 3 describes how we can use the rules to reduce the set of frequent 
itemsets. In Section 4 we give an algorithm to efficiently find this reduced 
set, and in Section 5 we evaluate the algorithm empirically. Related work is 
discussed in depth in Section 6. 

2 Deduction Rules 

In all that follows, X is the set of all items and T> is the transaction database. 

We will now describe sound and complete rules for deducing tight bounds 
on the support of an itemset I C X, if the supports of all its subsets are given. 
In order to do this, we will not consider itemsets that are no subset of /, and 
we can assume that all items in T> are elements of /. Indeed, "projecting away" 
the other items in a transaction database does not change the supports of the 
subsets of I. 

Definition 2.1. (/-Projection) Let I C X be an itemset. 

• The I -projection of a transaction T , denoted 7TjT, is defined as ttjT := 
{i\i£TCM}. 

• The I -projection of a transaction database T>, denoted njT>, consist of all 
/-projected transactions from T>. 

Lemma 2.2. Let I, J be itemsets, such that I C J CI. For every transaction 
database T>, the following holds: 

support(I ,T>) = support (/, ttjT>). 

Before we introduce the deduction rules, we introduce fractions and covers. 

Definition 2.3. (/-Fraction) Let /, J be itemsets, such that / C J C X, the 
I -fraction oiirjD, denoted by ff(X>) equals the number of transactions in ttjD 
that exactly consist of the set /. 

If V is clear from the context, we will write //, and if J — X, we will write 
//. The support of an itemset / is then J2ici'ci fl'- 

Definition 2.4. (Cover) Let / C X be an itemset. The cover of / in T), 
denoted by Cover (I, T>), consists of all transactions in V that contain /. 

Again, we will write Cover(I) if T> is clear from the context. 

Let /, J C X be itemsets, and J = /U{Ai, . . . , A n }. Notice that Cover(J) = 
nr=i Cover (I U {A^), and that | U Hi Cover(IU{Ai})\ = \ Cover fj . From 
the well-known inclusion- exclusion principle JlOj , p. 181] we learn 

\Cover(I)\ - ff = \Cover(I U {Ai})\ 

l<i<n 

- \Cover(lU{A t ,A j })\ + (-l) n | Cover(J)\, 

l<i<j<n 



3 



and since support(I U {A^ , . . . , Ai e }) = | Cover(I U {A^ , . . . , -A^ })|, we obtain 
(— l)^^ 1 ^ support { J) — // = support(I) — support(I U {^4i}) 

l<i<n 

+ ^ swpport(/U{A l ,^-}) + --- + (-l)l J - / l- 1 ^ support(J - {At}) 

l<i<j<n 1 < j- < n 

From now on, we will denote the sum on the right-hand side of this last equation 
by a(I, J). 

Since // is always positive, we obtain the following theorem. 

Theorem 1. For all itemsets I, J C X, a(I,J) is a lower (upper) bound on 
support(J) if | J — I\ is even (odd). The difference \support{J) — a(I, J)\ is 
given by fj . 

We will refer to the rule involving a(I, J) as TZj(I) and omit J when clear 
from the context. 

If for each subset I C J, the support support(I ,T>) = si is given, then the 
rules TZj(-) allow for calculating lower and upper bounds on the support of J. 
Let I denote the greatest lower bound we can derive with these rules, and u the 
smallest upper bound we can derive. Since the rules are sound, the support of 
J must be in the interval [l,u]. In ||, we show also that these bounds on the 
support of J are tight; i.e., for every smaller interval C [l,u], we can find 

a database V such that for each subset / of J, support (I, V) — sj, but the 
support of J is not within [I', u']. 

Theorem 2. For all itemsets I,JC.X, the rules {IZj(I) I C J} are sound 
and complete for deducing bounds on the support of J based on the supports of 
all subsets of J . 

The proof of the completeness relies on the fact that for all / C J, we have 
support (I ,T>) = ^2jcrci fi' ■ W e can consider the linear program consisting of 
all these equalities, together with the conditions // > for all fractions //. The 
existence of a database V that satisfies the given supports is equivalent to the 
existence of a solution to this linear program in the //'s and support ( J, V). 
From this equivalence, tightness of the bounds can be proved. For the details 
of the proof we refer to Q . 

Example 2.5. Consider the following transaction database. 
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Figure 1: Tight bounds on sabcd- si denotes support (I) 

Figure [l] gives the rules to determine tight bounds on the support of ABCD. 
Using these deduction rules, we derive the following bounds on sabcd without 
counting in the database. 

Lower bound: sabcd > 1 (Rule Tl(AC)) 
Upper bound: sabcd < 1 (Rule Tl(Aj) 

Therefore, we can conclude, without having to rescan the database, that the 
support of ABCD in V is exactly 1, while a standard monotonicity check would 
yield an upper bound of 2. 

3 Non-Derivable Itemsets as a Concise Repre- 
sentation 

Based on the deduction rules, it is possible to generate a summary of the set of 
frequent itemsets. Indeed, suppose that the deduction rules allow for deducing 
the support of a frequent itemset / exactly, based on the supports of its subsets. 
Then there is no need to explicitly count the support of / requiring a complete 
database scan; if we need the support of /, we can always simply derive it 
using the deduction rules. Such a set /, of which we can perfectly derive the 
support, will be called a Derivable Itemset (DI), all other itemsets are called 
Non-Derivable Itemsets (NDIs). We will show in this section that the set of 
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frequent NDIs allows for computing the supports of all other frequent itemsets, 
and as such, forms a concise representation [ fl2| of the frequent itemsets. To 
prove this result, we first need to show that when a set / is non-derivable, then 
also all its subsets are non-derivable. For each set /, let lj (uj) denote the lower 
(upper) bound we can derive using the deduction rules. 

Lemma 3.1. (Monotonicity) Let I C T be an itemset, and i E I — I an item. 
Then 2\ui\ju\ — liu{i}\ < 2mhi(\support(I) — lj\, \support(T) — Ui\) < \uj — lj\. 
In particular, if I is a DI, then also I U {i} is a DI. 

Proof. The proof is based on the fact that fj — // U ^+/juj/}- From Theorem]!] 
we know that fj is the difference between the bound calculated by IZi(J) and 
the real support of /. Let now J be such that the rule IZi(J) calculates the 
bound that is closest to the support of /. Then, the width of the interval [Zj, u{\ 
is at least 2fj. Furthermore, Tli\j{i}{J) and TZi u ^(J U {i}) are a lower and 
an upper bound on the support of / U {i} (if \I U {i} — ( J U {i})\ is odd, then 
| IU{i} — J\ is even and vice versa), and these bounds on lU{i} differ respectively 
/j u ^ and /jyjn from the real support of / U {i}. When we combine all these 

observations, we get: u/ u{l} - l lu{l} < fj U{l} + fj^f^ = fj <\{ui- h). □ 

This lemma gives us the following valuable insights. 

Corollary 3.2. The width of the intervals exponentially shrinks with the size 
of the itemsets. 

This remarkable fact is a strong indication that the number of large NDIs 
will be very small. This reasoning will be supported by the results of the exper- 
iments. 

Corollary 3.3. // / is a NDI, but it turns out that 7Zi(J) equals the support 
of I, then all supersets I U {i} of I will be a DI, with rules H-iu{i}(J) an d 

R/u{i}(Ju{i}). 

We will use this observation to avoid checking all possible rules for I U {i}. 
This avoidance can be done in the following way: whenever we calculate bounds 
on the support of an itemset /, we remember the lower and upper bound //, ui. 
If I is a NDI; i.e., lj ^ uj, then we will have to count its support. After 
we counted the support, the tests support (I) = lj and support (I) — uj are 
performed. If one of these two equalities obtains, we know that all supersets of 
/ are derivable, without having to calculate the bounds. 

Corollary 3.4. // we know that I is a DI, and that rule IZi(J) gives the exact 
support of I , then TZiu{i}(J U {«}) gives the exact support for I U {i}. 

Suppose that we want to build the entire set of frequent itemsets starting 
from the concise representation. We can then use this observation to improve the 
performance of deducing all supports. Suppose we need to deduce the support 
of a set /, and of a superset J of /; instead of trying all rules to find the exact 
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support for J, we know in advance, because we already evaluated /, which rule 
to choose. Hence, for any itemset which is known to be a DI, we only have to 
compute a single deduction rule to know its exact support. 



From Lemma 3.1, we easily obtain the following theorem, saying that the set 



of NDIs is a concise representation. We omit the proof due to space limitations. 

Theorem 3. For every database T>, and every support threshold s, let NDI(2?, s) 
be the following set: 

NDI(2?, s) := {(/, support (I, V)) | 2/ ^ uj}. 

NDI(2?, s) is a concise representation for the frequent itemsets, and for each 
itemset J not in NDI(P, s), we can decide whether J is frequent, and if J is 
frequent, we can exactly derive its support from the information in NDI(Z?, s). 



4 The NDI-Algorithm 

Based on the results in the previous section, we propose a level- wise algorithm to 
find all frequent NDIs. Since derivability is monotone, we can prune an itemset 
if it is derivable. This gives the NDI-algorithm as shown below. The correctness 



of the algorithm follows from the results in Lemma 3.1 



NDI(2?,s) 

i:=l;NDI:={};Ci:={{i}|i€2}; 
for all I in G x do /./ := 0; I.u := \V\; 
while Ci not empty do 

Count the supports of all candidates in Ci in one pass over T>\ 

Fi := {I 6 d | support(I,V) > s};. 

NDI := NDIUF,; 

Gen := {}; 

for all / G Fi do 

if support(I) ^ 1. 1 and support(I) ^ Lu then 
Gen := Gen U {/}; 

PreCi+i := AprioriGenerate(Gen); 

Ci+i := {}; 

for all J £ PreCi+i do 

Compute bounds [l,u] on support of J; 
if I y£ u then J.l := l; J.u := u; Ci+i := C,;+i U {J}; 
i:=i + l 
end while 
return NDI 

Since evaluating all rules can be very cumbersome, in the experiments we show 
what the effect is of only using a couple of rules. We will say that we use rules up 
to depth k if we only evaluate the rules IZj(I) for \I — J\ < k. The experiments 
show that in most cases, the gain of evaluating rules up to depth k instead of 
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candidate itemsets of size k 



Figure 2: Average interval-width of candidate itemsets. 



up to depth k — 1 typically quickly decreases if k increases. Therefore, we can 
conclude that in practice most pruning is done by the rules of limited depth. 



5 Experiments 

For our experiments, we implemented an optimized version of the Apriori al- 
gorithm and the NDI algorithm described in the previous section. We per- 
formed our experiments on several real-life datasets with different characteris- 
tics, among which a dataset obtained from a Belgian retail market, which is a 
sparse dataset of 41 337 transaction over 13 103 items. The second dataset was 
the BMS-Webview-1 dataset donated by Z. Zheng et al. |l6), containing 59 602 
transactions over 497 items. The third dataset is the dense census-dataset as 
available in the UCI KDD repository ||, which we transformed into a transac- 
tion database by creating a different item for every attribute- value pair, resulting 
in 32 562 transactions over 22 072 items. The results on all these datasets were 
very similar and we will therefore only describe the results for the latter dataset. 

Figure ^ shows the average width of the intervals computed for all candi- 
date itemsets of size k. Naturally, the interval-width of the singleton candidate 
itemsets is 32 562, and is not shown in the figure. In the second pass of the 
NDI-algorithm, all candidate itemsets of size 2 are generated and their intervals 
deduced. As can be seen, the average interval size of most candidate itemsets 
of size 2 is 377. From then on, the interval sizes decrease exponentially as was 
predicted by Corollary 3.2. 

Figure ||| shows the size of the concise representation of all NDIs compared 
to the total number of frequent patterns as generated by Apriori, for varying 
minimal support thresholds. If this threshold was set to 0.1%, there exist 990 097 
frequent patterns of which only 162 821 are non-derivable. Again this shows the 
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Figure 3: Size of concise representation. 

theoretical results obtained in the previous sections. 

In the last experiment, we compared the strength of evaluating the deduction 
rules up to a certain depth, and the time needed to generate all NDIs w.r.t. the 
given depth. Figure ^ shows the results. On the x-axis, we show the depth up to 
which rules are evaluated. We denoted the standard Apriori monotonicity check 
by 0, although it is actually equivalent to the rules of depth 1. The reason for 
this is that we also used the other optimizations described in Section 3. More 
specifically, if the lower or upper bound of an itemset equals its actual support, 
we can prune its supersets, which is denoted as depth 1 in this figure. The 
left y-axis shows the number of NDIs w.r.t. the given depth and is represented 
by the line 'concise representation'. The line 'NDI' shows the time needed to 
generate these NDIs. The time is shown on the right y-axis. The 'NDI+DI' line 
shows the time needed to generate all NDIs plus the time needed to derive all 
DIs, resulting in all frequent patterns. As can be seen, the size of the concise 
representation drops quickly only using the rules of depth 1 and 2. From there 
on, higher depths result in a slight decrease of the number of NDIs. From depth 
4 on, this size stays the same, which is not that remarkable since the number of 
NDIs of these sizes is also small. The time needed to generate these sets is best 
if the rules are only evaluated up to depth 2. Still, the running time is almost 
always better than the time needed to generate all frequent itemsets (depth 
0), and is hardly higher for higher depths. For higher depths, the needed time 
increases, which is due to the number of rules that need to be evaluated. Also 
note that the total time required for generating all NDIs and deriving all DIs 
is also better than generating all frequent patterns at once, at depth 1,2, and 
3. This is due to the fact that the NDI algorithm has to perform less scans 
through the transaction database. For larger databases this would also happen 
for the other depths, since the derivation of all DIs requires no scan through the 
database at all. 
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6 Related Work 

6.1 Concise Representations 

In the literature, there exist already a number of concise representations for 
frequent itemsets. The most important ones are closed itemsets, free itemsets, 
and disjunction-free itemsets. We compare the different concise representations 
with the NDI-representation. 

Free sets or Generators [[ll]] An itemset I is called free if it has no 
subset with the same support. We will denote the set of all frequent free item- 
sets with FreqFree. In ||, the authors show that freeness is anti-monotone; the 
subset of a free set must also be free. FreqFree itself is not a concise represen- 
tation for the frequent sets, unless if the set Border (FreqFree) := { I C X | VJ C 
/ : J G FreqFree A I FreqFree} is added We call the concise represen- 
tation consisting of these two sets ConFreqFree. Notice that free sets [|| and 
generators [llj are the same. 



Disjunction-free sets []7| or disjunction-free generators [11] Disjunction- 
free sets are essentially an extension of free sets. A set / is called disjunction-free 
if there does not exist two items i\,i2 in I such that support(I) = support(I — 
{ii}) + support(I — {12}) — support(I — {21,12})- This rule is in fact our rule 
1Zi(I — {ii,t2})- Notice that free sets are a special case of this case, namely 
when ii — i 2 - We will denote the set of frequent disjunction-free sets by 
FreqDFree. Again, disjunction-freeness is anti-monotone, and FreqDFree is not 
a concise representation of the set of frequent itemsets, unless we add the bor- 
der of FreqDFree. We call the concise representation containing these two sets 
ConFreqDFree . 
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Closed itemsets [13] Another type of concise representation that received a 
lot of attention in the literature || [l4| are the closed itemsets. They can 
be introduced as follows: the closure of an itemset I is the largest superset of 
/ such that its support equals the support of /. This superset is unique and is 
denoted by cl(I). An itemset is called closed if it equals its closure. We will 
denote the set of all frequent closed itemsets by FreqClosed. In [jl3|, the authors 
show that FreqClosed is a concise representation for the frequent itemsets. 

In the following proposition we give connections between the different concise 
representations. 

Proposition 6.1. For every dataset and support threshold, the following in- 
equalities are valid. 

1. The set of frequent closed itemsets is always smaller or equal in cardinality 
than the set of frequent free sets. 

2. The set of NDIs is always a subset of ConFreqDFree. 

Proof. 1. We first show that Closed — cl(Free). 

C Let C be a closed set. Let / be a smallest subsets of C such that 
cl(I) = C. Suppose / is not a free set. Then there exist J C I 
such that support(J) = support(I). This rule however implies that 
support(J) — support(C) . This is in contradiction with the minimal- 
ity of I. 

D Trivial, since cl is idempotent. 

This equality implies that cl is always a surjective function from Free to 
Closed, and therefore, \Free\ > \Closed\. 

2. Suppose / is not in ConFreqDFree. If / is not frequent, then the result is 
trivially satisfied. Otherwise, this means that I is not a frequent free set, 
and that there is at least one subset J oi I that is also not a frequent free 
set (otherwise / would be in the border of FreqDFree.) Therefore, there 
exist ii,%2 £ J such that support(J) — support(J — {*i}) + support{J — 
{i2}) ~ support (J — {«i,«2j-) = a(J,J — {11,12})- We now conclude, using 



Lemma 3.1, that / is a derivable itemset, and thus not in NDI. 

□ 

Other possible inclusions between the described concise representations do not 
satisfy, i.e., for some datasets and support thresholds we have |NDI| < \ Closed\, 
while other datasets and support thresholds have \ Closed\ < |NDI|. We omit 
the proof of this due to space limitations. We should however mention that 
even though FreqDFree is always a superset of NDI, in the experiments the 
gain of evaluating the extra rules is often small. In many cases the reduction 
of ConFreqDFree, which corresponds to evaluating rules up to depth 2 in our 
framework, is almost as big as the reduction using the whole set of rules. Since 
our rules are complete, this shows that additional gain is in many cases unlikely. 
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6.2 Counting Inference 

MAXMINER [§ In MAXMINER, Bayardo uses the following rule to derive 
a lower bound on the support of an itemset: 

support (I U {i}) < support(I) — drop{J,j) 

with T = I — J. J G I, and drop(J,j) — support(J) — support (J U {j}). This 
derivation corresponds to repeated application of rules 1Zi(I — {i\, 12})- 

PASCAL [B] In their PASCAL-algorithm, Bastide et al. use counting infer- 
ence to avoid counting the support of all candidates. The rule they are using to 
avoid counting is based on our rule 1Zj(I — {i}). In fact the PASCAL-algorithm 
corresponds to our algorithm when we only check rules up to depth 1, and do 
not prune derivable sets. Instead of counting the derivable sets, we use the de- 
rived support. Here the same remark as with the ConFreg-DFree-representation 
applies; although PASCAL does not use all rules, in many cases the perfor- 
mance comes very close to evaluating all rules, showing that for these databases 
PASCAL is nearly optimal. 
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